22 research outputs found

    Position Paper on Dataset Engineering to Accelerate Science

    Full text link
    Data is a critical element in any discovery process. In the last decades, we observed exponential growth in the volume of available data and the technology to manipulate it. However, data is only practical when one can structure it for a well-defined task. For instance, we need a corpus of text broken into sentences to train a natural language machine-learning model. In this work, we will use the token \textit{dataset} to designate a structured set of data built to perform a well-defined task. Moreover, the dataset will be used in most cases as a blueprint of an entity that at any moment can be stored as a table. Specifically, in science, each area has unique forms to organize, gather and handle its datasets. We believe that datasets must be a first-class entity in any knowledge-intensive process, and all workflows should have exceptional attention to datasets' lifecycle, from their gathering to uses and evolution. We advocate that science and engineering discovery processes are extreme instances of the need for such organization on datasets, claiming for new approaches and tooling. Furthermore, these requirements are more evident when the discovery workflow uses artificial intelligence methods to empower the subject-matter expert. In this work, we discuss an approach to bringing datasets as a critical entity in the discovery process in science. We illustrate some concepts using material discovery as a use case. We chose this domain because it leverages many significant problems that can be generalized to other science fields.Comment: Published at 2nd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE) https://ai-2-ase.github.io/papers/16%5cSubmission%5cAAAI_Dataset_Engineering-8.pd

    Iron Behaving Badly: Inappropriate Iron Chelation as a Major Contributor to the Aetiology of Vascular and Other Progressive Inflammatory and Degenerative Diseases

    Get PDF
    The production of peroxide and superoxide is an inevitable consequence of aerobic metabolism, and while these particular "reactive oxygen species" (ROSs) can exhibit a number of biological effects, they are not of themselves excessively reactive and thus they are not especially damaging at physiological concentrations. However, their reactions with poorly liganded iron species can lead to the catalytic production of the very reactive and dangerous hydroxyl radical, which is exceptionally damaging, and a major cause of chronic inflammation. We review the considerable and wide-ranging evidence for the involvement of this combination of (su)peroxide and poorly liganded iron in a large number of physiological and indeed pathological processes and inflammatory disorders, especially those involving the progressive degradation of cellular and organismal performance. These diseases share a great many similarities and thus might be considered to have a common cause (i.e. iron-catalysed free radical and especially hydroxyl radical generation). The studies reviewed include those focused on a series of cardiovascular, metabolic and neurological diseases, where iron can be found at the sites of plaques and lesions, as well as studies showing the significance of iron to aging and longevity. The effective chelation of iron by natural or synthetic ligands is thus of major physiological (and potentially therapeutic) importance. As systems properties, we need to recognise that physiological observables have multiple molecular causes, and studying them in isolation leads to inconsistent patterns of apparent causality when it is the simultaneous combination of multiple factors that is responsible. This explains, for instance, the decidedly mixed effects of antioxidants that have been observed, etc...Comment: 159 pages, including 9 Figs and 2184 reference

    Explorando a concorrência em agregados de computadores

    Get PDF

    Explorando a concorrência em agregados de computadores

    Get PDF

    An architecture of structured data stream analysis applied to the Brazilian digital TV system.

    No full text
    Diversos sistemas computacionais transmitem informação em fluxos contínuos de dados estruturados e, por vezes, hierarquizados. Este modelo de transmissão de dados tem como uma de suas características a grande densidade de informação, o que exige de um receptor o tratamento imediato das unidades extraídas deste canal de comunicação. Muitas vezes o volume de transmissão não permite, também, que a informação recebida seja armazenada permanentemente no receptor, o que torna a análise do conteúdo desses fluxos de dados um desafio. Este trabalho apresenta uma arquitetura para a análise de fluxo de dados estruturados aplicado à hierarquia lógica definida pelo Sistema Brasileiro de TV Digital para a transmissão de programas de televisão, validada por meio de uma implementação de referência completamente funcional.Various computing systems transfer data in structured data streams which also happen to be, sometimes, hierarchically organized. Such data stream model is characterized by the dense amount of information transmitted, which requires the receiver to immediately manipulate the elements extracted from that communication channel. The high rate in which data flows also makes it hard, if not impossible, for the receiver to store the desired information in its memory, which makes data flow analysis especially challenging. This work presents a novel structured data flow analysis architecture applied to the logical hierarchy defined by the Brazilian Digital TV System for the transmission of television programs, validated by means of a fully functional reference implementation

    Online Algorithms for the Linear Tape Scheduling Problem

    No full text
    Even in today’s world of increasingly faster storage technologies, magnetic tapes continue to play an essential role in the market. Yet, they are often overlooked in the literature, despite the many changes made to the underlying tape architecture since they were conceived. In this article, we introduce the LINEAR TAPE SCHEDULING PROBLEM (LTSP), which aims to identify scheduling strategies for read and write operations in single-tracked magnetic tapes that minimize the overall response times for read requests. Structurally, LTSP has many similarities with versions of the Travelling Repairman Problem and of the Dial-a-Ride Problem restricted to the real line. We investigate several properties of LTSP and show how they can be explored in the design of algorithms for the online version of the problem. Computational experiments show that the resulting strategies deliver very satisfactory scheduling plans, which in most cases are clearly superior (potentially differing by one order of magnitude) to those produced by a strategy currently used in the industry
    corecore